30 research outputs found
Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data
Similarity-based approaches represent a promising direction for time series
analysis. However, many such methods rely on parameter tuning, and some have
shortcomings if the time series are multivariate (MTS), due to dependencies
between attributes, or the time series contain missing data. In this paper, we
address these challenges within the powerful context of kernel methods by
proposing the robust \emph{time series cluster kernel} (TCK). The approach
taken leverages the missing data handling properties of Gaussian mixture models
(GMM) augmented with informative prior distributions. An ensemble learning
approach is exploited to ensure robustness to parameters by combining the
clustering results of many GMM to form the final kernel.
We evaluate the TCK on synthetic and real data and compare to other
state-of-the-art techniques. The experimental results demonstrate that the TCK
is robust to parameter choices, provides competitive results for MTS without
missing data and outstanding results for missing data.Comment: 23 pages, 6 figure
Classification of postoperative surgical site infections from blood measurements with missing data using recurrent neural networks
Clinical measurements that can be represented as time series constitute an
important fraction of the electronic health records and are often both
uncertain and incomplete. Recurrent neural networks are a special class of
neural networks that are particularly suitable to process time series data but,
in their original formulation, cannot explicitly deal with missing data. In
this paper, we explore imputation strategies for handling missing values in
classifiers based on recurrent neural network (RNN) and apply a recently
proposed recurrent architecture, the Gated Recurrent Unit with Decay,
specifically designed to handle missing data. We focus on the problem of
detecting surgical site infection in patients by analyzing time series of their
blood sample measurements and we compare the results obtained with different
RNN-based classifiers
Uncertainty-Aware Deep Ensembles for Reliable and Explainable Predictions of Clinical Time Series
Deep learning-based support systems have demonstrated encouraging results in
numerous clinical applications involving the processing of time series data.
While such systems often are very accurate, they have no inherent mechanism for
explaining what influenced the predictions, which is critical for clinical
tasks. However, existing explainability techniques lack an important component
for trustworthy and reliable decision support, namely a notion of uncertainty.
In this paper, we address this lack of uncertainty by proposing a deep ensemble
approach where a collection of DNNs are trained independently. A measure of
uncertainty in the relevance scores is computed by taking the standard
deviation across the relevance scores produced by each model in the ensemble,
which in turn is used to make the explanations more reliable. The class
activation mapping method is used to assign a relevance score for each time
step in the time series. Results demonstrate that the proposed ensemble is more
accurate in locating relevant time steps and is more consistent across random
initializations, thus making the model more trustworthy. The proposed
methodology paves the way for constructing trustworthy and dependable support
systems for processing clinical time series for healthcare related tasks.Comment: 11 pages, 9 figures, code at
https://github.com/Wickstrom/TimeSeriesXA
Noisy multi-label semi-supervised dimensionality reduction
Noisy labeled data represent a rich source of information that often are
easily accessible and cheap to obtain, but label noise might also have many
negative consequences if not accounted for. How to fully utilize noisy labels
has been studied extensively within the framework of standard supervised
machine learning over a period of several decades. However, very little
research has been conducted on solving the challenge posed by noisy labels in
non-standard settings. This includes situations where only a fraction of the
samples are labeled (semi-supervised) and each high-dimensional sample is
associated with multiple labels. In this work, we present a novel
semi-supervised and multi-label dimensionality reduction method that
effectively utilizes information from both noisy multi-labels and unlabeled
data. With the proposed Noisy multi-label semi-supervised dimensionality
reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled
data are labeled simultaneously via a specially designed label propagation
algorithm. NMLSDR then learns a projection matrix for reducing the
dimensionality by maximizing the dependence between the enlarged and denoised
multi-label space and the features in the projected space. Extensive
experiments on synthetic data, benchmark datasets, as well as a real-world case
study, demonstrate the effectiveness of the proposed algorithm and show that it
outperforms state-of-the-art multi-label feature extraction algorithms.Comment: 38 page
Approaching adverse event detection utilizing transformers on clinical time-series
Patients being admitted to a hospital will most often be associated with a
certain clinical development during their stay. However, there is always a risk
of patients being subject to the wrong diagnosis or to a certain treatment not
pertaining to the desired effect, potentially leading to adverse events. Our
research aims to develop an anomaly detection system for identifying deviations
from expected clinical trajectories. To address this goal we analyzed 16 months
of vital sign recordings obtained from the Nordland Hospital Trust (NHT). We
employed an self-supervised framework based on the STraTS transformer
architecture to represent the time series data in a latent space. These
representations were then subjected to various clustering techniques to explore
potential patient phenotypes based on their clinical progress. While our
preliminary results from this ongoing research are promising, they underscore
the importance of enhancing the dataset with additional demographic information
from patients. This additional data will be crucial for a more comprehensive
evaluation of the method's performance.Comment: 10 pages, 6 figure
Time series cluster kernels to exploit informative missingness and incomplete label information
The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture
models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing
values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning.
However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g.
medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our
approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited.
Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label
information to learn more accurate similarities.
Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal
electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the
effectiveness of the proposed method